Auditory-based filter-bank analysis as a front-end processor for speech recognition
نویسندگان
چکیده
A comparison of speech analysis based on human auditory processing and conventional LPC analysis is described. A comparison was made of the capabilities of these two types of parameters to recognize fourteen consonants extracted from Japanese consonant-vowel (CV) syllables spoken in isolation. Tree types of recognition algorithms were used: Dynamic time-warping with multiple template sets, hidden Markov models, and neural networks. The auditory system consisted of 35 channels, spanning from 100 to 5400 Hz, each of which consisted of a critical bandpass fittering process, a rectification process, an integration process, and a transformation into logarithmic form. A lateral inhibition process was also included in order to more closely simulate human auditory processing. The recognition experiments showed that parameters based on the features of human auditory processing are excellent for use in various types of speech recognition methods.
منابع مشابه
An Efferent-Inspired Auditory Model Front-End for Speech Recognition
In this paper, we investigate a closed-loop auditory model and explore its potential as a feature representation for speech recognition. The closed-loop representation consists of an auditory-based, efferent-inspired feedback mechanism that regulates the operating point of a filter bank, thus enabling it to dynamically adapt to changing background noise. With dynamic adaptation, the closed-loop...
متن کاملAuditory Based Feature Vectors for Speech Recognition Systems
Signal processing front end for extracting the feature set is an important stage in any speech recognition system. The optimum feature set is still not yet decided though the vast efforts of researchers. There are many types of features, which are derived differently and have good impact on the recognition rate. This paper presents one more successful technique to extract the feature set from a...
متن کاملImproving the filter bank of a classic speech feature extraction algorithm
The most popular speech feature extractor used in automatic speech recognition (ASR) systems today is the mel frequency cepstral coefficient (mfcc) algorithm. Introduced in 1980, the filter bank-based algorithm eventually replaced linear prediction cepstral coefficients (lpcc) as the premier front end, primarily because of mfcc’s superior robustness to additive noise. However, mfcc does not app...
متن کاملClean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end
The aim of this work is to enable a noise-free time-domain speech signal to be reconstructed from a stream of MFCC vectors and fundamental frequency and voicing estimates, such as may be received in a distributed speech recognition system. To facilitate reconstruction, both a sinusoidal model and a source-filter model of speech are compared by listening tests and spectrogram analysis, with the ...
متن کاملA Weighted Overlap Add-based Front-end for Speech Recognition
Speech signal enhancement is frequently referred to as a preprocessing step to speech recognition. However, in practice, this cannot be easily accomplished since the front-end signal processing techniques and/or parameters used in these two frequently differ. We apply a signal processing technique successfully used in speech enhancement to speech recognition and show that it can perform equally...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1989